What is this?

1 Exploring The Data

The data used is from college football games from 2000 to present. Each observation represents one play in a game, in which we know the team, the situation (down, time remaining), and the location on the field (yards to go, yards to reach end zone). We have information about the types of plays called as well in a text field.

Data summary
Name Piped data
Number of rows 1000
Number of columns 48
_______________________
Column type frequency:
character 20
factor 1
logical 2
numeric 25
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
PLAY_ID 0 1.00 12 18 0 1000 0
OFFENSE 0 1.00 3 22 0 153 0
OFFENSE_CONFERENCE 43 0.96 3 17 0 14 0
DEFENSE 0 1.00 3 22 0 158 0
DEFENSE_CONFERENCE 47 0.95 3 17 0 14 0
HOME 0 1.00 3 21 0 125 0
AWAY 0 1.00 3 22 0 179 0
GAME_ID 0 1.00 9 9 0 949 0
DRIVE_ID 0 1.00 10 11 0 1000 0
CLOCK 0 1.00 25 27 0 555 0
PLAY_TYPE 0 1.00 4 26 0 24 0
PLAY_TEXT 0 1.00 16 190 0 995 0
PPA 255 0.74 17 22 0 686 0
WALLCLOCK 1000 0.00 NA NA 0 0 0
STATUS_PERIOD 0 1.00 5 5 0 1 0
HALF 0 1.00 2 11 0 3 0
SCORE_EVENT 513 0.49 7 8 0 5 0
NEXT_SCORE_EVENT_HOME 0 1.00 7 11 0 7 0
HOME_TEAM 0 1.00 3 21 0 125 0
AWAY_TEAM 0 1.00 3 22 0 179 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
NEXT_SCORE_EVENT_OFFENSE 0 1 FALSE 7 TD: 426, Opp: 205, No_: 153, FG: 152

Variable type: logical

skim_variable n_missing complete_rate mean count
SCORING 0 1 0.07 FAL: 926, TRU: 74
LAST_DRIVE_HALF 0 1 0.06 FAL: 940, TRU: 60

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
OFFENSE_SCORE 0 1.00 13.63 12.86 0 3.00 10 21.00 77
DEFENSE_SCORE 0 1.00 13.91 13.25 0 3.00 10 21.00 62
DRIVE_NUMBER 0 1.00 14.37 8.62 1 7.00 14 21.00 45
PLAY_NUMBER 0 1.00 4.92 3.47 1 2.00 4 7.00 20
PERIOD 0 1.00 2.53 1.12 1 2.00 3 4.00 5
OFFENSE_TIMEOUTS 793 0.21 2.56 0.79 0 2.00 3 3.00 3
DEFENSE_TIMEOUTS 793 0.21 2.62 0.67 0 2.00 3 3.00 3
YARD_LINE 0 1.00 51.67 25.09 0 32.75 52 70.00 100
YARDS_TO_GOAL 0 1.00 49.94 25.15 0 31.75 55 69.00 100
DOWN 0 1.00 1.78 1.23 -1 1.00 2 3.00 4
DISTANCE 0 1.00 7.77 4.56 -1 5.00 10 10.00 30
YARDS_GAINED 0 1.00 5.62 10.05 -15 0.00 2 8.00 68
MINUTES 0 1.00 7.23 4.61 0 3.00 7 11.00 15
SECONDS 0 1.00 27.78 18.44 0 11.00 29 44.00 59
MINUTES_IN_HALF 2 1.00 14.58 9.13 0 7.00 15 23.00 30
SECONDS_IN_HALF 2 1.00 902.40 548.72 0 420.75 900 1390.75 1800
FLAG_SECONDS_IN_HALF 0 1.00 0.00 0.00 0 0.00 0 0.00 0
FLAG_DOWN 0 1.00 0.07 0.25 0 0.00 0 0.00 1
FLAG_DISTANCE 0 1.00 0.05 0.23 0 0.00 0 0.00 1
FLAG_YARD_LINE 0 1.00 0.00 0.00 0 0.00 0 0.00 0
END_HOME_SCORE 0 1.00 16.93 14.15 0 7.00 14 24.00 77
END_AWAY_SCORE 0 1.00 12.60 12.00 0 3.00 10 20.00 68
SEASON 0 1.00 2008.83 3.67 2001 2006.00 2009 2012.00 2014
NEXT_SCORE_EVENT_HOME_DIFF 366 0.63 4.55 2.98 0 3.00 7 7.00 7
NEXT_SCORE_EVENT_OFFENSE_DIFF 0 1.00 1.83 5.43 -7 -3.00 3 7.00 7

The goal of this analysis is to determine the value of individual plays in terms of expected points.

A sequence in this case is defined as all plays within a half between kickoffs (either the beginning or scoring events). Every sequence can end in one of 7 different outcomes, which we will define from the perspective of the team on offense. If a team has the ball, for any play, there are seven possible outcomes that play will eventually reach:

  • Touchdown (7 points)
  • Field goal (3 points)
  • Safety (2 points)
  • No_Score (0 points)
  • Opp_Safety (2 points)
  • Opp field goal (-3 points)
  • Opp touchdown (-7 points)

For instance, on drive 1, Team A receives the opening kickoff, drives for a few plays, and then punts. Team B takes over, which starts drive 2, and they drive for a few plays before also punting. Team A then manages to put together a drive that finally scores. All plays on these three drives are one sequence. The outcome of this sequence is the points scored by Team A - if they score a toucbhdown, their points from this sequence is 7 (assuming they make the extra point). Team B’s points from this sequence is -7 points.

When Team A kicks off to Team B to start drive 4, we start our next sequence, which will end either with one team scoring or at the end of the half. We’ll then start over with a new sequence in the second half.

The outcome for this analysis is the NEXT_SCORE_EVENT. Each play in a given sequence contributes to the eventual outcome of the sequence. Here we can see an example of one game and its drives.

For this game, we can filter to the plays that took place in the lead up to first score event. In this case, the first sequence included one drive and ended when Texas A&M kicked a field goal.

If we look at another sequence in the second half, there were multiple drives before a team was able to score.

Our goal is to understand how individual plays contribute to a team’s expected points, or the average points teams should expect to have given their situation (down, time, possession).

For instance, in the first drive of this game, Texas A&M received the ball at their own 25 yard line - the simplest intuition of expected points is to ask, for teams starting at the 25 yard line at the beginning of a game, how many points do they typically go on to score? The answer is to look at all starting drives with 75 yards to go and see what happened - we take the average of all of the points that followed from this situation.

## # A tibble: 7 × 2
## # Groups:   NEXT_SCORE_EVENT_OFFENSE [7]
##   NEXT_SCORE_EVENT_OFFENSE     n
##   <fct>                    <int>
## 1 TD                        2071
## 2 Opp_TD                    1855
## 3 FG                         709
## 4 Opp_FG                     637
## 5 No_Score                    24
## 6 Opp_Safety                  21
## 7 Safety                      13
## # A tibble: 1 × 4
##   YARDS_TO_GOAL DRIVE_NUMBER expected_points     n
##           <dbl>        <dbl>           <dbl> <dbl>
## 1            75            1            0.37  5382

In this case, this means teams with the ball at their own 25 to start the game generally obtained more points on the ensuing sequence than their opponents, so they have a slightly positive expected points.

But, this is also a function of the down. If we look at the expected points for a team in this situation in first down vs a team in this situation for fourth down, we should see a drop in their expected points - by the time you hit fourth down, if you haven’t moved from the 25, your expected points drops into the negatives, as you will now be punting the ball back to your opponent and it becomes more probable that they score than you.

## # A tibble: 7 × 6
## # Groups:   YARDS_TO_GOAL, NEXT_SCORE_EVENT_OFFENSE [7]
##   YARDS_TO_GOAL NEXT_SCORE_EVENT_OFFENSE   `1`   `2`   `3`   `4`
##           <dbl> <fct>                    <int> <int> <int> <int>
## 1            75 Opp_TD                    1351   321   117    72
## 2            75 TD                        1690   292   103    31
## 3            75 Opp_FG                     469   107    39    22
## 4            75 FG                         576    93    27    14
## 5            75 No_Score                    17     4     1     2
## 6            75 Safety                      10     2    NA     1
## 7            75 Opp_Safety                  13     5     2     1
## # A tibble: 4 × 4
##   YARDS_TO_GOAL  DOWN expected_points     n
##           <dbl> <dbl>           <dbl> <dbl>
## 1            75     1            0.65  4127
## 2            75     2           -0.3    825
## 3            75     3           -0.48   289
## 4            75     4           -2.17   143

We can apply this type of thinking to the entire field. If we look at all total plays in a game, how do expected points vary as a function of a team’s distance from their opponent’s goal line?

This should make sense - if you’re backed up against your own end zone, your opponent becomes more likely to have the next scoring event, either by gaining good field advantage after you punt or by getting a safety.

But, it’s not just position on the field - it’s also about the situation. If we look at how expected points varies by the down, we should see that fourth downs always have lower expected points.

We also have other features like distance.

As well as time remaining in the half.

2 Modeling

We’ll now proceed to modeling. I’ll set up training, validation, and test sets based around the season.

I plan to use the following as features in a baseline model.

I’ll now set up a recipe for the baseline model.

2.1 Workflows

I’ll define the model I’ll be using here, which is a multinomial logistic regression.

I’ll then create a workflow.

I’ll manually define resamples based on the seasons - rather than doing k-fold cross validation, I’ll assign each season to be a fold and train and assess the model leaving one season out at a time.

## Loading required package: iterators
## Loading required package: parallel

2.2 Training

Now I’ll train and assess on these resamples.

2.3 Resampling Performance

## # A tibble: 14 × 4
##    id            .metric     .estimator .estimate
##    <chr>         <chr>       <chr>          <dbl>
##  1 holdout: 2012 mn_log_loss multiclass     1.24 
##  2 holdout: 2013 mn_log_loss multiclass     1.24 
##  3 holdout: 2014 mn_log_loss multiclass     1.24 
##  4 holdout: 2015 mn_log_loss multiclass     1.25 
##  5 holdout: 2016 mn_log_loss multiclass     1.23 
##  6 holdout: 2017 mn_log_loss multiclass     1.24 
##  7 holdout: 2018 mn_log_loss multiclass     1.23 
##  8 holdout: 2012 roc_auc     hand_till      0.701
##  9 holdout: 2013 roc_auc     hand_till      0.707
## 10 holdout: 2014 roc_auc     hand_till      0.694
## 11 holdout: 2015 roc_auc     hand_till      0.707
## 12 holdout: 2016 roc_auc     hand_till      0.705
## 13 holdout: 2017 roc_auc     hand_till      0.701
## 14 holdout: 2018 roc_auc     hand_till      0.713

2.4 Inference

Understanding partial effects from a multinomial logit is already difficult, and I’ve thrown a bunch of interactions in there to make this even more difficult. I’ll look at predicted probabilities using an observed values approach for particular features (using a sample rather than the full dataset to save time). This means taking the model and then altering the feature of interest for every observation and taking the average predicted probability for each outcome across all observations.

How is the probability of the next scoring event influenced by where the offense has possession?

How is this affected by the down?

How does this translate into expected points?

2.5 Validation Set

We can evaluate the model via our leave-one-out approach, but we’ll also predict the validation set as an additional check. I’ll compare performance relative to a null model that simply predicts the incidence rate of each outcome in the training set.

What’s the log loss for each outcome?

3 Examining Results

I’ll now start diving into the predictions for individual plays as a means to evaluate plays and teams.

It’s worth noting that we might see some season-level differences that make comparison across seasons difficult, since the predictions are all coming from slightly different models due to resampling.

Get expected points added for all plays. This part is a little wobbly, due to data quality issues with defining sequences. The basic thought here is to say, at the start of a play, we know the expected points for a team in that situation, EP_Pre. We then look to the next play to see the expected points for the team after the result of the previous play, EP_Post. EP_Added is the difference between these two outcomes from the perspective of the offense.

This means that if the ball is turned over, but not scored, the team on offense becomes the defense and the sign of the expected points on the next play flips for their calculation. For events that produce touchdowns, I set EP_Post to be equal to the points scored on the play, 7 for touchdowns, 3 for FGs, 2 for safeties.

3.1 Game Results

I’ll look at a few games, play by play, to get a sense of how this is looking. I’ll pick one game completely at random, in no way influenced by my fandom.

From this game, we can look at the top ten positive points added plays and the top ten negative points added plays.

3.2 Team Offenses and Defenses by Game

Having scored all individual plays, we can now roll this up to whatever level of analysis we’re interested in. We can, for instance, look at a team’s offense game by game over this time period. Continuing to select a team at random, we’ll look at A&M’s offense by game.

We can do the same thing, but looking at a team’s defense - what was the expected points added for the opposing team’s offense? In this case, negative is good - it means the opposing team’s offense didn’t perform well.

I’ll put these side by side for a really good team, like Bama. A good performance is one where team’s offense EPA is higher than the points yielded by the defense.

Okay let’s look at a mediocre team, like Kansas.

3.3 Team Offenses and Defenses by Year

We can then aggregate this to a team’s offensive performance within a full year to rate their overall offensive/defensive efficiency. We can also break this down by passing vs rushing plays on both sides of the ball.

For the purpose of evaluating a team’s defense, I’ll flip the sign of the points, which are currently scored from the perspective of the offense, so that positive is always good for a team.

Putting this all together, I can rank a teams offense/defense by year and then sort to see where the top teams of all time tend to rank. Here are the top 50 teams using a composite score of both, including only offenses and defenses with at least 400 plays in a season.

And here are the 25 worst teams using the same criterion.

We can visualize all of this by placing every team based on its overall offensive/defensive efficiency.

Stop here for now.

4 Backtesting

I’ll next use the model we trained on 2012-2020 to evaluate prior seasons.